Search CORE

1,675 research outputs found

Integrating diverse genomic data using gene sets

Author: Karchin Rachel
Marchionni Luigi
Parmigiani Giovanni
Tyekucheva Svitlana
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We introduce and evaluate data analysis methods to interpret simultaneous measurement of multiple genomic features made on the same biological samples. Our tools use gene sets to provide an interpretable common scale for diverse genomic information. We show we can detect genetic effects, although they may act through different mechanisms in different samples, and show we can discover and validate important disease-related gene sets that would not be discovered by analyzing each data type individually

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

Classifying Variants of Undetermined Significance in BRCA2 with Protein Likelihood Ratios

Author: Agarwal Mukesh
Beattie Mary S.
Couch Fergus
Karchin Rachel
Sali Andrej
Publication venue: Libertas Academica
Publication date: 01/01/2008
Field of study

Background: Missense (amino-acid changing) variants found in cancer predisposition genes often create difficulties when clinically interpreting genetic testing results. Although bioinformatics has developed approaches to predicting the impact of these variants, many of these approaches have not been readily applicable in the clinical setting. Bioinformatics approaches for predicting the impact of these variants have not yet found their footing in clinical practice because 1) interpreting the medical relevance of predictive scores is difficult; 2) the relationship between bioinformatics “predictors” (sequence conservation, protein structure) and cancer susceptibility is not understood.Methodology/Principal Findings: We present a computational method that produces a probabilistic likelihood ratio predictive of whether a missense variant impairs protein function. We apply the method to a tumor suppressor gene, BRCA2, whose loss of function is important to cancer susceptibility. Protein likelihood ratios are computed for 229 unclassified variants found in individuals from high-risk breast/ovarian cancer families. We map the variants onto a protein structure model, and suggest that a cluster of predicted deleterious variants in the BRCA2 OB1 domain may destabilize BRCA2 and a protein binding partner, the small acidic protein DSS1. We compare our predictions with variant “re-classifications” provided by Myriad Genetics, a biotechnology company that holds the patent on BRCA2 genetic testing in the U.S., and with classifications made by an established medical genetics model [1]. Our approach uses bioinformatics data that is independent of these genetics-based classifications and yet shows significant agreement with them. Preliminary results indicate that our method is less likely to make false positive errors than other bioinformatics methods, which were designed to predict the impact of missense mutations in general.Conclusions/Significance: Missense mutations are the most common disease-producing genetic variants. We present a fast, scalable bioinformatics method that integrates information about protein sequence, conservation, and structure in a likelihood ratio that can be integrated with medical genetics likelihood ratios. The protein likelihood ratio, together with medical genetics likelihood ratios, can be used by clinicians and counselors to communicate the relevance of a VUS to the individual who has that VUS. The approach described here is generalizable to regions of any tumor suppressor gene that have been structurally determined by X-ray crystallography or for which a protein homology model can be built

Directory of Open Access Journals

PubMed Central

MODBASE, a database of annotated comparative protein structure models and associated resources.

Author: Barkan David T
Carter Hannah
Davis Fred P
Eramian David
Eswar Narayanan
Karchin Rachel
Kelly Libusha
Mankoo Parminder
Marti-Renom Marc A
Pieper Ursula
Sali Andrej
Webb Ben M
Publication venue: eScholarship, University of California
Publication date: 23/10/2008
Field of study

MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

PubMed Central

eScholarship - University of California

CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer

Author: Amit
Benjamini
Birney
Breiman
Carter
Carter
Dewey Kim
Forbes
Futreal
Hannah Carter
Kaminker
Karchin
Mark Diekhans
Michael C. Ryan
Mooney
Ng
Pruitt
Pruitt
Rachel Karchin
Subramanian
Sunyaev
Wing Chung Wong
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline

CiteSeerX

Crossref

PubMed Central

Detecting species-site dependencies in large multiple sequence alignments

Author: Dandekar Thomas
Huenerberg Mirja
Karchin Rachel
Müller Tobias
Müller-Reible Clemens
Rahmann Sven
Schoen Christoph
Schultz Jörg
Schwarz Roland
Seibel Philipp N.
Publication venue: Oxford University Press
Publication date: 01/10/2009
Field of study

Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence–site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals

PubMed Central

MDC Repository

Identifying Mendelian disease genes with the Variant Effect Scoring Tool

Author: Carter Hannah
Cooper David Neil
Douville Christopher
Karchin Rachel
Stenson Peter D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency > 1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies

Online Research @ Cardiff

Springer - Publisher Connector

PubMed Central

MODBASE: a database of annotated comparative protein structure models and associated resources

Author: Braberg Hannes
Davis Fred P.
Eramian David
Eswar Narayanan
Karchin Rachel
Kelly Libusha
Madhusudhan M. S.
Marti-Renom Marc
Melo Francisco
Pieper Ursula
Rossi Andrea
Sali Andrej
Shen Min-Yi
Webb Ben M.
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, ), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, ) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, )

Crossref

PubMed Central

Determination of cancer risk associated with germ line BRCA1 missense variants by functional analysis

Author: Baumbach Lisa
Carvalho Marcelo
Couch Fergus
Gayol Luis
Goldgar David
Grist Scott Andrew
Karchin Rachel
Manoukian Siranoush
Marsillac Sylvia
Monteiro Alvaro
Nathanson Katherine
Pickard-Brzosowicz Jennifer
Radice Paolo
Rondinelli Edson
Sali Andrej
Silva Rosane
Sutphen Rebecca
Swaby Ramona
Urmenyi Turan
Publication venue: 'American Association for Cancer Research (AACR)'
Publication date: 01/01/2007
Field of study

©2007 American Association for Cancer Research. Published version of the paper reproduced here in accordance with the copyright policy of the publisher. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the publisher.Germ line inactivating mutations in BRCA1 confer susceptibility for breast and ovarian cancer. However, the relevance of the many missense changes in the gene for which the effect on protein function is unknown remains unclear. Determination of which variants are causally associated with cancer is important for assessment of individual risk. We used a functional assay that measures the transactivation activity of BRCA1 in combination with analysis of protein modeling based on the structure of BRCA1 BRCT domains. In addition, the information generated was interpreted in light of genetic data. We determined the predicted cancer association of 22 BRCA1 variants and verified that the common polymorphism S1613G has no effect on BRCA1 function, even when combined with other rare variants. We estimated the specificity and sensitivity of the assay, and by meta-analysis of 47 variants, we show that variants with 50% can be classified as neutral. In conclusion, we did functional and structure-based analyses on a large series of BRCA1 missense variants and defined a tentative threshold activity for the classification missense variants. By interpreting the validated functional data in light of additional clinical and structural evidence, we conclude that it is possible to classify all missense variants in the BRCA1 COOH-terminal region. These results bring functional assays for BRCA1 closer to clinical applicability. [Cancer Res 2007;67(4):1494–501

Crossref

PubMed Central

Flinders Academic Commons

Assessing the pathogenicity of insertion and deletion variants with the Variant Effect Scoring Tool (VEST-Indel)

Author: Cooper David Neil
Douville Christopher
Gygax Derek M.
Karchin Rachel
Kim Rick
Masica David L.
Ryan Michael
Stenson Peter Daniel
Publication venue: 'Wiley'
Publication date: 26/10/2015
Field of study

Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features—DNA and protein sequence conservation, indel length, and occurrence in repeat regions—are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new “PubMed” feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method

Online Research @ Cardiff

PubMed Central

<html>Autologous reconstitution of human cancer and immune system <i>in vivo</i></html>

Author: Chung Christine H.
Fu Juan
Hayes D. Neil
Karchin Rachel
Kim Young J.
Masica David L.
Pardoll Drew
Sen Rupashree
Walter Vonn
Publication venue
Publication date: 19/12/2016
Field of study

Correlative studies from checkpoint inhibitor trials have indicated that better understanding of human leukocytic trafficking into the human tumor microenvironment can expedite the translation of future immune-oncologic agents. In order to directly characterize signaling pathways that can regulate human leukocytic trafficking into the tumor, we have developed a completely autologous xenotransplantation method to reconstitute the human tumor immune microenvironment in vivo. We were able to genetically mark the engrafted CD34+ bone marrow cells as well as the tumor cells, and follow the endogenous leukocytic infiltration into the autologous tumor. To investigate human tumor intrinsic factors that can potentially regulate the immune cells in our system, we silenced STAT3 signaling in the tumor compartment. As expected, STAT3 signaling suppression in the tumor compartment in these autologously reconstituted humanized mice showed increased tumor infiltrating lymphocytes and reduction of arginase-1 in the stroma, which were associated with slower tumor growth rate. We also used this novel system to characterize human myeloid suppressor cells as well as to screen novel agents that can alter endogenous leukocytic infiltration into the tumor. Taken together, we present a valuable method to study individualized human tumor microenvironments in vivo without confounding allogeneic responses

PubMed Central

Carolina Digital Repository